Plant-Wide Neurocontrol of the Tennessee Eastman Challenge Process Using Evolutionary Reinforcement Learning

نویسنده

  • C. Aldrich
چکیده

The Tennessee Eastman (TE) Control Challenge proposed by Downs and Vogel [1] is a test bed problem for use in evaluating advanced process control methodologies from a plant-wide perspective. The dynamic model for the process (based on an actual industrial process) integrates the operation of five unit operations; viz. an exothermic, two-phase reactor, a partial condenser, a centrifugal compressor, a flash drum and a reboiled stripper. The process incorporates 41 process variables and 12 manipulated variables. Several researchers have considered various controller designs for the process, which include multi-loop SingleInput-Single-Output (SISO) control strategies, Dynamic Matrix Control (DMC) and linear/nonlinear Model Predictive Control (MPC). These approaches entail a significant degree of design effort in that the selection of the optimal (economic) set points and the selection of the optimal pairing of process and manipulated variables, are non-trivial tasks. Furthermore, linear controller designs may not be optimal in dealing with the non-linear behaviour inherent in the process dynamics. The success of biological organisms in controlling complex and uncertain environments may serve as significant motivation for a more biological (as opposed to algorithmic) approach to developing control strategies. Evolutionary reinforcement learning provides a framework for the development of control policies from direct cause-effect interactions with a simulated dynamic environment. In this paper the use of a novel evolutionary reinforcement learning algorithm, SANE (Symbiotic, Adaptive Neuro-Evolution), is demonstrated for the development of neural network controllers. The SANE algorithm performs the global search for an optimal plant-wide control strategy. The SANE algorithm is a genetic algorithm based on implicit fitness sharing, which requires neurons in the genetic population to cooperate (to form a neural network) in performing the required control task. This cooperative approach ensures that genetic diversity is maintained in the genetic population, which favours a continued global search to the problem at hand. Also, several parallel searches on the neuron level for different aspects of the control solution should be more effective than a single search for the entire solution. The SANE algorithm offers several key advantages over conventional controller design approaches. Near optimal neurocontrollers were developed without prior knowledge of the optimal (economic) set points. Also, the pairing of process and manipulated variables and the elimination of control interactions were implicit in the SANE search process. All process variables and manipulated variables were available to SANE for neurocontroller development, whereas other approaches constrain the number of manipulated variables to be used for controller development. The neurocontroller was also developed utilising the open loop unstable process, whereas a nonlinear MPC design required prestabilisation of the plant with PI controllers. The robust and high performance of the neurocontrollers during set point changes and in the presence of disturbances, is verified. As shown for the Tennessee Eastman Control Challenge; evolutionary reinforcement learning may be used in process control to established a plant-wide control strategy by developing neural network controllers with minimal prior process analysis. Introduction Downs and Vogel [1] developed the Tennessee Eastman (TE) Control Challenge as a realistic process model for testing control methodologies. The Tennessee Eastman Control Challenge offers numerous opportunities for control study purposes, of which the exploration of multivariate control, optimisation and nonlinear control methodologies are most pertinent to this paper [1]. Several factors result in the TE process being suitable for this plant-wide control study. The process model was developed on an actual industrial process so that the simulated results closely approximate what may be expected in reality. A large number of interacting process and manipulated variables are incorporated into the model, making it a truly significant plant-wide control problem. The model contains both integrating (vessel levels) and self-regulatory (plant pressures) subsystems. Also, the presence of a recycle stream compounds the control problem. The simulated model allows for the simulation of a wide range of disturbances, from sticking valves to random process upsets to loss of key feed steams. Plant dynamic behaviour is also extended to control valves, in that control valves are also considered to have a transient response. The process variables are comprised of continuous variables (temperatures, levels and pressures) and discrete variables (analysers' outputs) with different sample periods [1]. Of greater interest than the shear scope of the control problem to be solved, is the opportunity that it offers to compare various controller design methodologies. Various approaches have been considered in the past, such as multi-loop Single-Input-Single-Output (SISO) control strategies, Dynamic Matrix Control (DMC) and linear/nonlinear Model Predictive Control (MPC). A brief analysis of difficulties associated with a multi-loop SISO design strategy and a nonlinear Model Predictive Control methodology follows thereafter. This paper focuses on development of a centralised control system design, which is in sharp contrast to the decentralised approach taken in multiple SISO control loop designs. In a multiple SISO design the problem needs to be decomposed into a number of design stages to make the design process manageable. The design approach also requires a degree of engineering judgement that results from experience for pairing process and manipulated variables. Intimate knowledge of the plant dynamics and relative proportions of expected flow rates and rates of change in process variables is required before any such design may be undertaken. A large number of different techniques need to be utilised, such as Bristol's relative gain array, the Niederlinsji Index, linear saturation analysis, nonlinear disturbance and saturation analysis and finally dynamic simulation. Moreover, loop tuning and the selection of appropriate process-manipulated variable pairings need to be considered in the presence and absence of noise and disturbances [2]. For the muli-loop SISO design several iterations on the design procedure may thus be required. Loops may also need to be tailored to deal with expected or known disturbances. This gives no guarantee of the control system's performance in the presence of unknown or unexpected disturbances. The trade-off in obtaining robust control generally results in a loss of loop performance as a result of detuning. Further, only PI controllers are used throughout the design process and although PI control is appropriate for a large proportion of control problems, severe nonlinearities around set points may significantly degrade the performance of linear controllers. The tuning parameters may thus only be appropriate over a limited range of the desired operating range. As suggested by McAvoy & Ye [2], their multiple SISO may serve as a platform from which an advanced predictive control system may be developed. It has been demonstrated that SISO and linear MPC-type algorithms are insufficient in dealing with the full range of possible process conditions, unless overrides and logic operators are added to the design process. Such overrides may consume a large portion of the design process. In essence SISO strategies are not appropriate for dealing with multiple, interacting constraints as posed by the TE problem. A nonlinear Model Predictive Control approach may be considered to overcome these shortcomings [3]. The TE process model is open loop unstable. Although NMPC is possible for unstable plants, the complexity of the design procedure escalates considerably, making stabilisation of the plant using SISO controllers necessary. Before NMPC may thus be considered, the multiple SISO control problem first needs to be solved. As described, the SISO system that needs to be designed to stabilise the open loop plant is non-trivial and makes the control system design a huge undertaking should the utilisation of advanced control also be considered. In this NMPC scheme the manipulated variables are set points of lower level control loops. It is also evident that poorly tuned PI loops may impact negatively on the performance of the NMPC controller, as the SISO loops change the dynamics of the system on which the NMPC controller is to be designed. Should the SISO controllers slow the dynamics of the process, the settling times for set point changes in the NMPC controller may be unnecessarily sluggish [3]. An NMPC design entails significant design effort. Particular care needs to be taken in the modelling and NMPC formulation. The main stumbling block to NMPC is the model development, as the formulation of a useful (simplified) nonlinear model for incorporation in the control loop is relatively difficult and time consuming. Analytical tools are available to support multiple SISO loop pairings, but have far less value to NMPC designs. In the NMPC design approach, the critical process-manipulated variable pairings also require engineering judgement and experimentation. This introduces some uncertainty as to whether a optimal choice of control pairings has been made, which impacts on the rest of the design [3]. In the development of an NMPC controller the optimal steady state for each operating mode needs to be found. This required solving a nonlinear programming problem, which used an augmented Langrangian strategy. The optimisation results played an important part in selecting which manipulated variables should be used. This resulted in the NMPC design considering only 8 of the possible 12 manipulated variables [4]. The only other implementation of reinforcement learning in the chemical engineering literature is by Hoskins and Himmelblau [5] who used the Adaptive Heuristic Critic algorithm, which is a dynamic programming technique in reinforcement learning. The implementation pertained to temperature control in a continuous-stirred-tank-reactor with cooling. The objective was to learn to control the flow rate through the cooling coils to ensure that the reactor temperature is maintained within a specified tolerance. This constitutes a simple SISO control problem with one process variable and one manipulated variable. Owing to the large number of learning trials for such a small control problem, Hoskins & Himmelblau [5] concluded that reinforcement learning is not an efficient way of learning and recommended that the technique not be used for developing control strategies in general. This paper aims to demonstrate that evolutionary reinforcement learning offers significant advantages for the development of high performance control strategies. It also serves to confirm that advancements in evolutionary reinforcement learning algorithm development (particularly the SANE algorithm) has progressed to such an extent that their use is appropriate for the development of plant-wide control strategies for the chemical industries. It will be demonstrated that many of the difficulties present in both classical and advanced controller designs may be circumvented in the biological approach afforded by the SANE algorithm. The robust, high performance of the developed neurocontrollers is demonstrated for set point changes and in the presence of numerous simultaneous disturbances. Symbiotic Adaptive Neuro-Evolution (SANE) For operation in complex nonlinear environments it is desirable to design controllers that operate with greater independence from human interaction. The ability of a controller to remain autonomous is reflected by the controller's robust performance, despite an assortment of unexpected occurrences in its operating environment. Furthermore, an autonomous controller needs to maintain high and robust performance over a wide operating range. The success of biological organisms at completing a variety of complex tasks in uncertain environments, remains a significant incentive and framework for the development of robust learning techniques and the use of biologically motivated generalisation tools (such as neural networks) for process control applications [6]. By way of analogy, continuous interaction with a dynamic environment is fundamental to the nature and mechanism of human learning. A newborn infant has no explicit tutor, but the infant does have a direct sensorimotor connection to its environment. Interaction with its environment produces an abundance of information regarding cause and effect, regarding the consequences of actions and also which behavioural patterns will lead to attaining specific goals or rewards. Such knowledge of the environment provides the learner with the ability to change the environment through a particular pattern of behaviour [6]. Likewise, reinforcement learning is a computational approach to automating the learning and decision making process. Its approach is removed from other learning techniques in that the emphasis is on learning from direct interaction with the environment, without exemplary supervision or even complete models of the environment. A learning process is typically initialised to a randomly structured controller, that is unfamiliar with the behavioural pattern that will lead to the successful completion of the proposed control task. Reinforcement learning involves a search for a particular controller structure that executes an appropriate set of actions that yield the highest possible reward. For learning to occur, the controller needs to sense the state of the environment and learn from the way in which particular actions change the environment, resulting in a lesser or greater degree of reward. A clearly defined goal of what constitutes the successful completion of the required task must relate the desired environmental state to a particular expression (level) of reward. Therefore, reinforcement learning establishes a framework for learning through interaction between a controller and the environment in terms of states, actions and rewards. It provides a means for programming controllers using cause and effect (reward and punishment) interactions, without explicitly needing to define how the goal is to be achieved. This is particularly beneficial where data or exemplars of behaviour are not available, as is often the case with the design of control systems for process plants [6]. Environment (dynamic system) Neurocontroller Reward rt rt+1 st+1 State st

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fuzzy Real-Time Optimization of the Tennessee Eastman Challenge Process

A Real-Time Optimization (RTO) strategy incorporating the fuzzy sets theory is developed, where the problem constraints obtained from process considerations are treated in fuzzy environment. Furthermore, the objective function is penalized by a fuzzified form of the key process constraints. To enable using conventional optimization techniques, the resulting fuzzy optimization problem is the...

متن کامل

A Thermodynamic Based Plant-Wide Control Design Procedure of the Tennessee Eastman Process

In this work, we apply the systematic approach to plant-wide control design presented in [1], based on the fundamentals of process networks, thermodynamics and systems theory, to the Tennessee Eastman (TE) Challenge Process, deriving robust decentralized controllers that will ensure the stability of the complete plant. We take one step further in the control design procedure by completing it wi...

متن کامل

Evaluation of a pattern matching method for the Tennessee Eastman challenge process

In this paper, we evaluate multivariate pattern matching methods for the Tennessee Eastman (TE) challenge process. The pattern matching methodology includes principal component analysis based similarity factors and dissimilarity factor of Kano et al., that compare current and historical data. In our similarity factor approach, the start and end times of disturbances are not known a priori and t...

متن کامل

Hierarchical Control and Optimisation of the Tennessee Eastman Process

Based on integrated system optimisation and parameter estimation a method is described for on-line steady state optimisation which compensates for model-plant mismatch and solves a non-linear optimisation problem by iterating on a linear quadratic representation. The method requires real process derivatives which are estimated using a dynamic identification technique. The utility of the method ...

متن کامل

Cognitive fault diagnosis in Tennessee Eastman Process using learning in the model space

This paper focuses on the Tennessee Eastman (TE) process and for the first time investigates it in a cognitive way. The cognitive fault diagnosis does not assume prior knowledge of the fault numbers and signatures. This approach firstly employs deterministic reservoir models to fit the multiple-input and multiple-output signals in the TE process, which map the signal space to the (reservoir) mo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001